Loan Data from Prosper Exploration by Justinas Marozas

This report explores loan data from Prosper with 113937 observations and 81 data points extracted. As of time of writing, brief descriptions of data points can be found here.

Changes to the dataset and considerations

  • Loans as a domain is largely unknown to me. With 81 odd datapoint to figure out this report will be more of a lookaround than a focussed analysis;
  • An observation in the dataset is a snapshot of a loan listing. Time series analysis of a listing is not an option, but grouping by MemberKey it would be possible to run time series analysis on borrowers. With 90831 unique values of MemberKey there’s few members that would have enough listings to be worth the effort;
  • I’ll convert ListingCreationDate, ClosedDate, DateCreditPulled, LoanOriginationDate, FirstRecordedCreditLine do datetime type;
  • I’ll convert LoanOriginationQuarter to a chronologically ordered factor;
  • I’ll drop “Not Employed” and “Not Displayed” levels from IncomeRange and order it. It would be better to create a separate datapoint for this because actual income ranges and two dropped levels all have distinct meanings, but not particularly interested in that at the moment;
  • I’ll convert ProsperRating..Alpha. and CreditGrade into ordered factors. I’ll also introduce data point CreditGrade.ProsperRating that is a joined version of ProsperRating..Alpha. and CreditGrade, because their levels seem to carry same meaning and a listing never has both present;
  • T’ll convert Term into ordered factor as it only has 3 distinc values;
  • I’ll introduce data point ListingCategory based on ListingCategory..numeric. and description in the data point definition;
  • I’ll convert IncomeVerifiable, CurrentlyInGroup, IsBorrowerHomeowner to
    boolean type;
  • Ill introduce past.due.days ordered factor based on LoanStatus.

Univariate Plots Section

## 'data.frame':    113937 obs. of  84 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : POSIXct, format: "2007-08-26 19:09:29" "2014-02-27 08:28:07" ...
##  $ CreditGrade                        : Ord.factor w/ 8 levels "NC"<"HR"<"E"<..: 5 NA 2 NA NA NA NA NA NA NA ...
##  $ Term                               : Ord.factor w/ 3 levels "12"<"36"<"60": 2 2 2 2 2 3 2 2 2 2 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : POSIXct, format: "2009-08-14" NA ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Ord.factor w/ 7 levels "HR"<"E"<"D"<"C"<..: NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : logi  TRUE FALSE FALSE TRUE TRUE TRUE ...
##  $ CurrentlyInGroup                   : logi  TRUE FALSE TRUE FALSE FALSE FALSE ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : POSIXct, format: "2007-08-26 18:41:46" "2014-02-27 08:28:14" ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : POSIXct, format: "2001-10-11" "1996-03-18" ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Ord.factor w/ 6 levels "$0"<"$1-24,999"<..: 3 4 NA 3 6 6 3 3 3 3 ...
##  $ IncomeVerifiable                   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : POSIXct, format: "2007-09-12" "2014-03-03" ...
##  $ LoanOriginationQuarter             : Ord.factor w/ 33 levels "Q4 2005"<"Q1 2006"<..: 8 33 6 28 31 32 30 30 32 32 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##  $ ListingCategory                    : Factor w/ 20 levels "Debt Consolidation",..: NA 2 NA 16 2 1 1 2 7 7 ...
##  $ CreditGrade.ProsperRating          : Ord.factor w/ 8 levels "NC"<"HR"<"E"<..: 5 7 2 7 4 6 3 5 8 8 ...
##  $ past.due.days                      : Ord.factor w/ 6 levels "Past Due (1-15 days)"<..: NA NA NA NA NA NA NA NA NA NA ...
##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##  ListingCreationDate            CreditGrade    Term      
##  Min.   :2005-11-09 20:44:28   C      : 5649   12: 1614  
##  1st Qu.:2008-09-19 10:02:14   D      : 5153   36:87778  
##  Median :2012-06-16 12:37:19   B      : 4389   60:24545  
##  Mean   :2011-07-09 08:33:43   AA     : 3509             
##  3rd Qu.:2013-09-09 19:40:48   HR     : 3508             
##  Max.   :2014-03-10 12:20:53   (Other): 6745             
##                                NA's   :84984             
##                  LoanStatus      ClosedDate                 
##  Current              :56576   Min.   :2005-11-25 00:00:00  
##  Completed            :38074   1st Qu.:2009-07-14 00:00:00  
##  Chargedoff           :11992   Median :2011-04-05 00:00:00  
##  Defaulted            : 5018   Mean   :2011-03-07 19:48:20  
##  Past Due (1-15 days) :  806   3rd Qu.:2013-01-30 00:00:00  
##  Past Due (31-60 days):  363   Max.   :2014-03-10 00:00:00  
##  (Other)              : 1108   NA's   :58848                
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000           C      :18345         Min.   : 1.00  
##  1st Qu.:3.000           B      :15581         1st Qu.: 4.00  
##  Median :4.000           A      :14551         Median : 6.00  
##  Mean   :4.072           D      :14274         Mean   : 5.95  
##  3rd Qu.:5.000           E      : 9795         3rd Qu.: 8.00  
##  Max.   :7.000           (Other):12307         Max.   :11.00  
##  NA's   :29084           NA's   :29084         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           Mode :logical       Mode :logical   
##  1st Qu.: 26.00           FALSE:56459         FALSE:101218    
##  Median : 67.00           TRUE :57478         TRUE :12719     
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey      DateCreditPulled             
##                         :100596   Min.   :2005-11-09 00:30:04  
##  783C3371218786870A73D20:  1140   1st Qu.:2008-09-16 22:31:26  
##  3D4D3366260257624AB272D:   916   Median :2012-06-17 08:01:23  
##  6A3B336601725506917317E:   698   Mean   :2011-07-09 16:14:50  
##  FEF83377364176536637E50:   611   3rd Qu.:2013-09-11 14:31:19  
##  C9643379247860156A00EC0:   342   Max.   :2014-03-10 12:20:56  
##  (Other)                :  9634   NA's   :1                    
##  CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine      
##  Min.   :  0.0         Min.   : 19.0         Min.   :1947-08-24 00:00:00  
##  1st Qu.:660.0         1st Qu.:679.0         1st Qu.:1990-06-01 00:00:00  
##  Median :680.0         Median :699.0         Median :1995-11-01 00:00:00  
##  Mean   :685.6         Mean   :704.6         Mean   :1994-11-17 06:23:33  
##  3rd Qu.:720.0         3rd Qu.:739.0         3rd Qu.:2000-03-14 00:00:00  
##  Max.   :880.0         Max.   :899.0         Max.   :2012-12-22 00:00:00  
##  NA's   :591           NA's   :591           NA's   :697                  
##  CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
##  Min.   : 0.00      Min.   : 0.00   Min.   :  2.00            
##  1st Qu.: 7.00      1st Qu.: 6.00   1st Qu.: 17.00            
##  Median :10.00      Median : 9.00   Median : 25.00            
##  Mean   :10.32      Mean   : 9.26   Mean   : 26.75            
##  3rd Qu.:13.00      3rd Qu.:12.00   3rd Qu.: 35.00            
##  Max.   :59.00      Max.   :54.00   Max.   :136.00            
##  NA's   :7604       NA's   :7604    NA's   :697               
##  OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
##  Min.   : 0.00         Min.   :    0.0             Min.   :  0.000     
##  1st Qu.: 4.00         1st Qu.:  114.0             1st Qu.:  0.000     
##  Median : 6.00         Median :  271.0             Median :  1.000     
##  Mean   : 6.97         Mean   :  398.3             Mean   :  1.435     
##  3rd Qu.: 9.00         3rd Qu.:  525.0             3rd Qu.:  2.000     
##  Max.   :51.00         Max.   :14985.0             Max.   :105.000     
##                                                    NA's   :697         
##  TotalInquiries    CurrentDelinquencies AmountDelinquent  
##  Min.   :  0.000   Min.   : 0.0000      Min.   :     0.0  
##  1st Qu.:  2.000   1st Qu.: 0.0000      1st Qu.:     0.0  
##  Median :  4.000   Median : 0.0000      Median :     0.0  
##  Mean   :  5.584   Mean   : 0.5921      Mean   :   984.5  
##  3rd Qu.:  7.000   3rd Qu.: 0.0000      3rd Qu.:     0.0  
##  Max.   :379.000   Max.   :83.0000      Max.   :463881.0  
##  NA's   :1159      NA's   :697          NA's   :7622      
##  DelinquenciesLast7Years PublicRecordsLast10Years
##  Min.   : 0.000          Min.   : 0.0000         
##  1st Qu.: 0.000          1st Qu.: 0.0000         
##  Median : 0.000          Median : 0.0000         
##  Mean   : 4.155          Mean   : 0.3126         
##  3rd Qu.: 3.000          3rd Qu.: 0.0000         
##  Max.   :99.000          Max.   :38.0000         
##  NA's   :990             NA's   :697             
##  PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
##  Min.   : 0.000            Min.   :      0        Min.   :0.000      
##  1st Qu.: 0.000            1st Qu.:   3121        1st Qu.:0.310      
##  Median : 0.000            Median :   8549        Median :0.600      
##  Mean   : 0.015            Mean   :  17599        Mean   :0.561      
##  3rd Qu.: 0.000            3rd Qu.:  19521        3rd Qu.:0.840      
##  Max.   :20.000            Max.   :1435667        Max.   :5.950      
##  NA's   :7604              NA's   :7604           NA's   :7604       
##  AvailableBankcardCredit  TotalTrades    
##  Min.   :     0          Min.   :  0.00  
##  1st Qu.:   880          1st Qu.: 15.00  
##  Median :  4100          Median : 22.00  
##  Mean   : 11210          Mean   : 23.23  
##  3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :646285          Max.   :126.00  
##  NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $0            :  621   Mode :logical   
##  1st Qu.: 0.140    $1-24,999     : 7274   FALSE:8669      
##  Median : 0.220    $25,000-49,999:32192   TRUE :105268    
##  Mean   : 0.276    $50,000-74,999:31050                   
##  3rd Qu.: 0.320    $75,000-99,999:16916                   
##  Max.   :10.010    $100,000+     :17337                   
##  NA's   :8554      NA's          : 8547                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount LoanOriginationDate           LoanOriginationQuarter
##  Min.   : 1000      Min.   :2005-11-15 00:00:00   Q4 2013:14450         
##  1st Qu.: 4000      1st Qu.:2008-10-02 00:00:00   Q1 2014:12172         
##  Median : 6500      Median :2012-06-26 00:00:00   Q3 2013: 9180         
##  Mean   : 8337      Mean   :2011-07-21 03:44:57   Q2 2013: 7099         
##  3rd Qu.:12000      3rd Qu.:2013-09-18 00:00:00   Q3 2012: 5632         
##  Max.   :35000      Max.   :2014-03-12 00:00:00   Q2 2012: 5061         
##                                                   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
##                                                                          
##            ListingCategory  CreditGrade.ProsperRating
##  Debt Consolidation:58308   C      :23994            
##  Other             :10494   B      :19970            
##  Home Improvement  : 7433   D      :19427            
##  Business          : 7189   A      :17866            
##  Auto              : 2572   E      :13084            
##  (Other)           :10976   (Other):19465            
##  NA's              :16965   NA's   :  131            
##                 past.due.days   
##  Past Due (1-15 days)  :   806  
##  Past Due (16-30 days) :   265  
##  Past Due (31-60 days) :   363  
##  Past Due (61-90 days) :   313  
##  Past Due (91-120 days):   304  
##  Past Due (>120 days)  :    16  
##  NA's                  :111870

There’s a gap in late 2008 - early 2009. Must be related to some data points only having values until / from 2009.

Besides a slowdown in early 2013, number of listings increases almost exponentially since 2009.

Round figures are very popular for original loan amounts.

Stated monthly income distribution is massively sqewed with some peope claiming to earn over a million per month.

Borrower rate, APR and lender yield all have very similar values and follow very similar distributions.

Debt to income ratio is capped at 10, giving a fat end for the tail of distribution, but majority of listings has much healthier ratios. Let’s zoom in a bit.

A very smooth sqewed distribution here. Nothing unusual about it.

One year term loans are unexpectedly rare.

All listings have at least 1 investor, but very few have more.

Debt consolitation is by far the most popular listing category. Counts between categories vary greatly.

Vast majority of occupations do not fall under predefined categories.

Besides some loan listings actually declaring $0 income, nothing surprising in distribution of income range.

There’s some not fully funded listings. Distribution tail thickens going away from main bulk of values. This is curious.

Small number of extreme values. Having a recommendation is very rare.

Small past due periods are more common than longer ones. Probably becasue this catches some people that forgot to pay in time. Past 15 days, distribution stays fairly flat.

A little over half listings are from home owners.

It’s not typical to belong to a group.

It’s typical to have verifiable income.

Univariate Analysis

What is the structure of your dataset?

113937 observations and 81 data points, plus 3 calculated data points. As of time of writing, brief descriptions of data points can be found here.

What is/are the main feature(s) of interest in your dataset?

With so many data points the dataset could be split into multiple slices each telling something important about it. I’m sure I’ve missed some important features, but of the ones I looked at these were most interesting: LoanOriginalAmount, Term, Investors, PercentFunded, Income, IsBorrowerHomeOwner.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

Did you create any new variables from existing variables in the dataset?

  • past.due.days ordered factor based on LoanStatus to look more into past due loans;
  • CreditGrade.ProsperRating that is a joined version of ProsperRating..Alpha. and CreditGrade, because their levels seem to carry same meaning and a listing never has both present;
  • ListingCategory based on ListingCategory..numeric. and description in the data point definition to avoid having to memorize arbitrary IDs in order to perform any kind of analysis about it.

Of the features you investigated, were there any unusual distributions?

  • PercentFunded has a tail that thickens moving away from bulk of values;
  • LoanOriginalAmount spikes at round figures.

Bivariate Plots Section

We see same clusters of observations on round numbers. Dots stack to vertical bars, meaning, that most listings tend to fall on small number of LoanOriginalAmount values.

##                         unique.LoanOriginalAmounts 
##                                               2468 
##                                       all.listings 
##                                             113937 
##        unique.LoanOriginalAmounts.divisible.by.500 
##                                                 68 
## listings.with.LoanOriginalAmounts.divisible.by.500 
##                                             102273

Most loan amounts are divisible by 500.

Investor counts tends to increase as loan amount increases, but for most ranges small investor counts are more likely.

Clear diagonal lines mark loans with small number of investors. We also have a lot of loans where loan amount per investor is very small, indicating that investing a small amount is popular acros all loan amounts.

We can see loans over $25000 only started appearing since 2013.

Loans under $2000 stopped appearing since 2011.

Decrease in loan count on early 2013 is visible across all amounts.

Starting 2013 small number of inverstors per loan became much more popular.

LoanOriginalAmount and Term relationship looks as expected. Longterm loans tend to be bigger.

Not fully funded listings have a very currious relationship between LoanOriginalAmount and PercentFunded. Overal correlation is weak, but dots fall into diagonal lines.

Diagonal lines seem to arrive to round loan amount values as they approach full funding. These values probably are the amounts originally asked for.

LenderYield and PercentFunded are unrelated.

Investors and PercentFunded are not related either.

CreditGrade.ProsperRating and LoanOriginalAmount are related. LoanOriginalAmount increases until rating C and then flattens out.

Some listing categories tend to have much bigger loans than others. Debt Consolidation is surprisingly big. Baby&Adoption loans tend to be even bigger than Wedding Loans.

Not surprisingly BorrowerAPR and BorrowerRate have a very strong relationship. What’s unexpected is that it neatly falls into bunch of straight lines.

Homeowners tend to get bigger loans.

Most listing categories have fairly similar popularity between homeowners and non-homeowners. One exception is Home Improvement, with much more homeowners. This makes perfect sense.

Borrowers that are in groups tend to get smaller loans.

Borrowers with non-verifiable income tend to have a little smaller loans and is probably first sub-group I’ve looked at that doesn’t have loans over $25000.

Now that’s some seriously extreme outliers. Let’s zoom in.

Borrowers with verifiale income have tighter distribution of stated monthly income.

Looks like EstimatedReturn has little to do with ListingCategory. More popular categories have more outliers.

Credit grade clearly has a relationship with estimated return. Better the grade smaller and more nealty distributed the estimated return.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Homeowners seem to get larger loans. They also seem to be getting more loans listed in home improvements, business, debt consolidation and childcare categories. That tells a little about social status of homeowners.

Even though most loans are funded by a single investor regardless of loan amount, having large amount of investors (small contributions) seems to be popular regardless of loan amount as well.

Loan amount tencds to be larger for meter credit ratings up to C and stays similar for higher ratings. This could be due to an upper limit of loans amounts visible in dataset.

What was the strongest relationship you found?

BorrowerRate and BorroweAPR.

Multivariate Plots Section

Even though loan amount per investor tends to be much greather for A, B and C ratings, centers of distributions are similar across credit ratings.

PercentFunded doesn’t depend on Investor count.

Investor count tends to be higher for best credit scores.

Throughout listing categories distribution centers of loan amount per investor look similar. Most categories have strongly skewed distributions.

Personal and student loans seem to have more investors per loan amount.

Regardless of credit grade homeowners tend to get larger loans. Both groups are capped at same amounts per credit grade though.

There are some visible caps for lower credit grades. Starting 2013 some were increased.

Lower credit grades tend to get smaller loans and more widely distributed estimated returns.

As expected, borrower rate and estimated return seem related. As not expected, negative estimated returns have a lot of high borrower rates in the mix.

Loan amount doesn’t seem related to borrower rate or estimated return, but their variablity decreases as loan amount increases.

This is useless being so tiny, but we’re looking looking at 5 variables at once!

It’s still good enough to reasert some previous observations that were difficult to see in previous plots. Change of loan amount caps per credit grade over time for example. We also see that negative estimated returns was a temporary thing and starting 2011 disappeared together with a lot of estimated return extreme values.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

Homeowners tend to get larger loans across credit grades. I was expecting this effect to diminish for higher credit grades, but that was not the case.

Variability of data points like estimated return decreases as loan amount increases.


Final Plots and Summary

Plot One

Description One

Lower credit grade borrowers dominate loan amounts under 5000, but quickly disappear as amount increases.

All densities spike on loan amounts divisible by 5000.

Plot Two

Description Two

Credit grade clearly has a relationship with estimated return. Better the grade smaller and more nealty distributed the estimated return.

Plot Three

Description Three

Lower credit grades tend to get smaller loans and more widely distributed estimated returns.

Better credit grades concentrate at the lower end of estimated return with an exception of a bunch of negative estimated returns for credit grade HR.


Reflection

This was a very challenging dataset to explore. Main struggle was analysis paralysis inducing number of data points. It feels like my explorations barely touched the surface.

It was interesting to see hints of lender policy changes in time series as well as tendency to borrow round amounts. These high density values created difficult overplotting problems.

What was unexpected is how variability of many variables decreases as loan amount and listing creation date increases. This shows that lender has stricter rules for larger loans, as well as rules got stricter between 2010 and 2011 in general.

It was a bit surprising to see such a visible relationship between credit grade and estimated return.

There’s still plenty of data points to investigate in future work. Initially when I started working on this dataset I expected to find clues on how to predict how much could be borrowed based on borrower properties, but it doesn’t seem to contain rejected loans, so that is hardly possible as people don’t always borrow largest amount possible. This dataset would be good to build models though.